Evidence Principles for Education Decision-Making

Executive Summary

District and school leaders face many claims about what programs, curricula, tutoring models, and education products can deliver. The challenge is interpreting what evidence those claims are based on, what the evidence does and does not establish, and what to do next when evidence is thin or mixed.

This report sets out evidence principles to help decision-makers interpret claims, distinguish evidence about implementation and effectiveness, outcomes, and transfer, and decide what to do next—whether that means purchasing, piloting, renewing, scaling, or ending a program, product, curriculum, or policy.

The principles cover:

  • Implementation: What was delivered and what it takes to deliver it, including staffing, time, training, and supports (Principle 1)
  • Evidence strength: How strong and complete the evidence base is, and whether it supports an effectiveness claim or describes patterns (Principles 2–3)
  • Outcomes and interpretation: Whether outcomes match the decision and how to interpret findings, including effect sizes and practical vs. statistical significance (Principles 4–6)
  • Transfer: What it means to use evidence across settings and why multiple contexts matter (Principles 7–8)
  • Personalization claims: How to evaluate “works best for…” claims (Principle 9)
  • Ongoing learning: What to monitor after adoption (Principle 10)

To support use in procurement, piloting, and continuous improvement, this report includes an Evidence Receipt: a one-page template that records the treatment and comparison, target population, outcomes, design, core results, implementation conditions, limitations, and the next evidence step.

A district can use an Evidence Receipt to document the basis for a recommendation, compare options during procurement, and revisit the same questions at renewal.


Introduction

Districts face high-stakes decisions on tight timelines, yet research is often presented in ways that make it difficult to discern what a claim is based on and what it can support. Summaries and “evidence-based” labels may omit what the program was compared to, which outcomes were measured, and what implementation supports were required.

Badges, tiers, and approved lists can signal that a program has “evidence,” but they often do not make clear the limits of that evidence, what was delivered, or what it would take to implement the program well.

This report does not rank products or certify “what works.” It offers a set of evidence principles to help education leaders:

  • Interpret research claims
  • Distinguish evidence about implementation, outcomes, and transfer
  • Decide what to do next when evidence is incomplete or mixed

Considering a new program takes time, money, and organizational effort. Change should be justified by need and by evidence that the new approach is likely to improve outcomes relative to current practice.

At a basic level, decisions about adopting a new program come down to two questions:

  1. What is required to implement the solution?
  2. Will implementing this solution improve outcomes?

The principles are intended to help leaders interpret the information they are given, ask for what is missing, and document the basis for a decision.


How to Use This Report: The Evidence Receipt

This report includes an Evidence Receipt, a one-page template for recording what the evidence supports, what it does not support, and what evidence is needed next.

Decision-makers can use an Evidence Receipt to:

  • Compare options during procurement
  • Document the basis for a recommendation
  • Revisit decisions at renewal

Begin by stating the claim you are evaluating. A claim about implementation is different from a claim about outcomes, and a claim about outcomes is different from a claim about tailoring to individuals.

The Evidence Receipt is not a scorecard and does not certify that a program “works.” Its purpose is to make the basis for an evidence claim explicit and surface missing information that affects interpretation.

The template includes:

  • Claim and decision: What claim is being evaluated and what decision it informs
  • Treatment and comparison: What was delivered and what it was compared to
  • Population and setting: Who was included and where the evidence comes from
  • Outcomes and measurement: What outcomes were measured and how
  • Design and credibility: What design supports the claim and key threats to validity
  • Core findings: Results interpreted in practical terms
  • Implementation conditions: Staffing, time, training, and supports required
  • Limits and open questions: What remains uncertain
  • Next evidence step: What would reduce uncertainty

We return to the Evidence Receipt in the case studies to show how it can be completed and used to guide decisions.


What Comes Next

The next section presents ten evidence principles to help interpret claims and guide decisions. Each principle explains what to look for, why it matters, and how to apply it in practice.

Evidence Principles

The principles that follow are meant to help education leaders interpret evidence claims in a consistent way and decide what to do next when evidence is thin or mixed. Each principle states a core idea, explains why it matters, and offers questions to guide judgment.


Principle 1: Program descriptions can help assess the full cost of implementation

Question: What is known about the processes and resources required to implement the program as intended?

Why it matters: Program costs go beyond materials or licensing. They include staff time, training, coordination, and monitoring. If implementation requirements exceed local capacity, results will be disappointing—even if the program looked promising elsewhere.

Questions and Considerations

  1. Is the program meaningfully different from current practice?
    If not, improving existing programs may be more effective than switching.
  2. Who has to do what, and how will you know it is happening?
    Clarify roles, routines, and monitoring systems.
  3. Will people actually use the program as expected?
    Low participation or resistance can undermine effectiveness.
  4. What are the core components vs. flexible elements?
    Adaptation may change the program’s impact.
  5. What resources are required, and what will this replace?
    Include time, staffing, training, and opportunity costs.

Key takeaway: Before relying on outcome evidence, confirm the program can be implemented as intended.


Principle 2: The rigor and comprehensiveness of evidence matter

Question: How rigorous is the evidence, and does it support claims about outcomes?

Why it matters: Not all evidence is equally informative. Transparency, independence, and completeness affect how much confidence you should place in findings.

Questions and Considerations

  1. Are full studies available, not just summaries?
  2. Are there studies of the current version of the program?
  3. Are evaluations conducted independently?
  4. Is implementation described in detail?
  5. Were methods specified in advance (e.g., pre-registration)?
  6. Are limitations clearly discussed?

Key takeaway: Strong evidence is transparent, complete, and supported by independent work.


Principle 3: Demonstrating effectiveness requires a point of comparison

Question: Does the study design support causal claims?

Why it matters: Effectiveness requires comparing outcomes to what would have happened otherwise (the counterfactual).

Example: Improvements over time alone do not prove effectiveness—students often improve without intervention.

Questions and Considerations

  1. Is there a comparison group?
  2. Were participants randomly assigned or well-matched?
  3. Does the study address selection bias?

Key takeaway: Without a credible comparison, treat results as descriptive—not causal.


Principle 4: Relevance of findings depends on the outcomes measured

Question: Do the outcomes align with the decision?

Why it matters: Strong evidence on irrelevant outcomes does not inform decisions.

Example: Gains on a narrow skill (e.g., fractions) may not translate to overall achievement.

Questions and Considerations

  1. Do outcomes match the claims being made?
  2. Are measures valid and reliable?
  3. Do short-term outcomes predict longer-term outcomes?

Key takeaway: Evaluate whether outcomes are meaningful for your goals.


Principle 5: Effect sizes can be deceiving and difficult to compare

Question: How should effect sizes be interpreted?

Why it matters: Effect sizes depend on population, measures, and comparisons, making cross-study comparisons misleading.

Questions and Considerations

  1. Does the design support causal interpretation?
  2. Are outcomes comparable across studies?
  3. Are populations comparable?
  4. What is the comparison condition?
  5. Is the effect large enough to justify cost?

Key takeaway: Do not treat effect sizes as a simple ranking of effectiveness.


Principle 6: Practical significance matters more than statistical significance

Question: Is the impact large enough to matter?

Why it matters: Statistically significant results may still be too small to justify cost or effort.

Example: A small but statistically significant improvement may not justify an expensive program.

Questions and Considerations

  1. Is the effect meaningful in magnitude?
  2. Is the interpretation tied to real-world outcomes?

Key takeaway: Focus on meaningful impact, not just statistical significance.


Principle 7: Multiple factors influence effectiveness in new settings

Question: Will this work here?

Why it matters: Results depend on context, implementation, and what the program replaces.

Questions and Considerations

  1. How similar is the study context to your setting?
  2. What contextual factors matter?
  3. Are required resources available?
  4. Are results consistent across studies?

Key takeaway: Treat transfer as an empirical question, not an assumption.


Principle 8: Evidence from multiple contexts is more compelling

Question: Does the program work across settings?

Why it matters: Single studies leave open whether results depend on specific conditions.

Questions and Considerations

  1. How many studies exist?
  2. Are results consistent?
  3. Are outcomes comparable?
  4. Is the comparison condition similar?
  5. Are implementation details clear?

Key takeaway: Look for consistent results across multiple contexts.


Principle 9: Treat “works best for…” claims with skepticism

Question: What evidence supports subgroup claims?

Why it matters: Subgroup findings are often unstable and may reflect chance or implementation differences.

Questions and Considerations

  1. Is there a valid comparison within subgroups?
  2. Was the subgroup analysis pre-specified?
  3. How many subgroup analyses were conducted?
  4. Has the finding been replicated?
  5. Could results reflect implementation differences?

Key takeaway: Do not base high-stakes decisions on unverified subgroup claims.


Principle 10: Learning about implementation and impact is ongoing

Question: What should be monitored after adoption?

Why it matters: Adoption is not the end of the evidence process. Outcomes depend on implementation and context.

Questions and Considerations

  1. Is the program a good fit for local capacity?
  2. Is there a plan for monitoring implementation?
  3. Are there plans to adjust implementation?
  4. Would a pilot or evaluation be useful?

Key takeaway: Monitor implementation and outcomes, and adjust based on evidence.


What Comes Next

The case studies that follow show how these principles apply in real decision contexts and how an Evidence Receipt can guide decisions when evidence is incomplete or mixed.

Next
Next

Small Steps Create Big Shifts